PADRE | A Parallel Document Retrieval Engine
نویسنده
چکیده
Developments in text retrieval on the AP1000 since last year's PCW are reported. The software, now called PADRE, has been entered in the competition associated with the 1994 Text Retrieval Conference (TREC-3). PADRE is now capable of document relevance estimation and ranking, and supports data loading from and dumping to the Fujitsu Local Filesystem. A new load balancing operation has been devised and implemented and improved techniques for handling cell-program error conditions have been adopted. Experiments have been successfully carried out on document collections exceeding 1.5 million documents and 5 gigabytes of data. Performance results are presented.
منابع مشابه
The Design And Implementation Of A Parallel Document Retrieval Engine
1 1 Date paper completed. Publication as a technical report was delayed for various reasons. SUMMARY Document retrieval as traditionally formulated is an inherently parallel task because the document collection can be divided into N sub-collections each of which may be searched independently. Document retrieval software can potentially exploit the power and capacity of a large-scale parallel ma...
متن کاملA Parallel Document Retrieval Server For The World
An architecture is proposed which enables the Parallel Document Retrieval Engine (PADRE), running on a single-user Fujitsu AP1000 multicom-puter, to operate as an information server on the World Wide Web. The advantages and disadvantages of a distributed memory parallel machine for this purpose are discussed and the likely applicability to diierent types of parallel machine is considered. Ideas...
متن کاملA Parallel Document Retrieval Server For The World Wide Web
An architecture is proposed which enables the Parallel Document Retrieval Engine (PADRE), running on a single-user Fujitsu AP1000 multicomputer, to operate as an information server on the World Wide Web. The advantages and disadvantages of a distributed memory parallel machine for this purpose are discussed and the likely applicability to di erent types of parallel machine is considered. Ideas ...
متن کاملPADRE for COWs
Earlier work with the Parallel Document Retrieval Engine was oriented toward parallel machines such as the AP1000, characterised by many nodes, few disks, small memory per node (by current standards), single-user operation and high communication performance, relative to node computational power. Present generation parallel machines are much more like clusters of workstations (COWs). There are t...
متن کاملSearching For Meaning With The Help Of A PADRE
Full-text scanning o ers signi cant advantages over other methods of document retrieval but is normally too slow for use on large collections. The Fujitsu AP1000 parallel distributed-memory machine has been used to reduce the time penalty for full-text scanning to acceptable interactive levels. The query language for the retrieval software (called PADRE) is described herein and di erences betwe...
متن کامل